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We present the topometric MST method to search for clusters of photons in the LAT sky, which was used to 
obtain the seed list for the compilation of the First LAT catalog. This method works well in non-dense field 
and can be profitably used at energies higher than a few GeV. We describe the particular techniques developed 
by us to improve the cluster selection criteria and the estimate of the astronomical coordinates of the possibly 
associated gamma-ray sources. A simulation technique to evaluate the confidence level of the source detection 
is presented. 
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1. 7-ray sources detection with the 
Minimal Spanning Tree 

The Minimal Spanning Tree (MST) is a topomet- 
ric algorithm useful for searching clusters in a field of 
points (or nodes) . It was used by our group for detect- 
ing clusters of photons in 7-ray images to obtain a list 
of candidate sources for the 11-month LAT catalog. 
The application of MST to LAT images, however, is 
not straightforward and presents some problems, two 
of which are discussed in this contribution. The first 
problem concerns the accuracy of the source coordi- 
nates derived from MST, relevant for a safe identifi- 
cation of possible counterparts of 7-ray sources. The 
second problem is that of estimating the significance 
of detection of the "selected" source candidates. We 
first present a short review of MST: for a more com- 
plete description see Campana et al. (2008) [J. 

Given a set of N points, or nodes ia a multidimen- 
sional space, we can compute the set {A^} of weighted 
edges connecting them: the MST (Zahn, 1971) is 
the tree (i.e. a graph without closed loops) con- 
necting all the nodes with the minimum total weight 
min[T,i\i]. For a set of points in a Cartesian frame, 
the edges are the lines joining the nodes, weighted by 
their length. 

We divided the LAT sky in a few regions to take 
into account the presence of the Galactic emission and 
considered the photon arrival directions as nodes in a 
bi-dimensional graph, the edge weight being the an- 
gular distance between them. Then we computed for 
each region the MST by means of the Prim algorithm, 
that grows a tree connecting in succession the nearest 
neighbour of each node already in the tree, 
starting from one in the photon list. Fig. 1 shows a 
field of the LAT sky, at energies higher than 4 GeV, 



with the MST that connects the events' positions. 



To extract only the locations where the photons 
clusterize, the following operations can be performed: 
i) separation: remove all the edges having a length 
A > A cut , the separation value, defined in units of the 
mean edge length A m = (EjAj)//V in the MST; we 
obtain thus a set of disconnected sub-trees; ii) elimi- 
nation: remove all the sub-trees having a number of 
nodes N < N cut : so we remove small casual clusters 
of photons, leaving only the clusters having a size over 
the expected flux limit. After the application of these 
filters, the remaining set sub-trees {Sk} provides a 
first list of candidate 7-ray sources. 



Campana et al. (2008) [T| considered two quanti- 
ties useful to evaluate the "goodness" of the clusters 
selected by MST: the number of nodes in the clus- 
ter rik and the clustering parameter gk, defined as 
the ratio between A m and X rn ,k, the mean length of 
the fe-th cluster edges. The practical application sug- 
gested the use, instead of rife, of the quantity defined 
as Mfc = rifc<7fe, that we named magnitude, that com- 
bines the effects of the number of nodes with their 
clustering. 



The MST is a 1-dimensional structure embedded in 
a 2 (or more)-dimensional space and therefore it can 
produce poor and not symmetric clusters around the 
centroid and elongated along the local tree direction. 
This pattern does not fit the instrumental PSF and 
these clusters cannot be accepted as genuine candi- 
dates of 7-ray sources. Optimized criteria to rule out 
these clusters have to be used. 
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Figure 1: Left panel: A region (20° x 20°) of the 11 month LAT sky at energies >4 GeV centered at Z=150°, 6=25°. 
Right panel: the MST between the photons and the clusters (in red) found applying the selection criteria A cu t = 0.8A,. 
and N cut > 3. Green circles mark the candidate 7-ray sources after the selection on g and M; blue dots mark the 
positions of sources in the LAT catalog. 



2. The accuracy of sources' coordinates 

A good estimate of the sources' positions in the 
sky is very important for the identification of possible 
counterparts. The simple aritmetic mean of the coor- 
dinates of all photons belonging to a selected cluster 
can fail to provide a satisfactory location for different 
reasons: (i) the cluster is in a relatively high back- 
ground region and it extends in a particular direction 
with some nodes well outside the PSF; (ii) connec- 
tion and/or proximity with another cluster; (Hi) small 
number of nodes; (iv) high number of nodes but mod- 
erate clustering. 

The estimate of the cluster centroids can be im- 
proved by using suitable "weights" in averaging the 
coordinates of the nodes or to select the nodes to be 
used. We used some different approaches and verified 
the one producing the smallest differences with respect 
to a sample of 150 sources in the 11 month LAT cata- 
log positions, chosen with different number of photons 
and clustering. The adopted weights for the photon 
coordinates were: i) the inverse of a power of the dis- 
tance to the nearest photon; ii) elimination of some 
of the most distant photons from the first centroid. 

The mean angular difference between the LAT cata- 
log positions (used as reference) and those of the MST 
clusters in the considered sample was 0°.lll. The ap- 
plication of the inverse square distance weight gave 
reduced the mean difference to 0°.081 while the lat- 
ter of the above methods gave a slightly greater value 
around 0°.095. The chosen solution, also convenient 



for computational time, was that of weighting the dis- 
tances with power of 1/A equal to 2. 



3. The significance of source detections 

Campana et al. (2008) 1] studied the probability 
distributions of g and n in uniform random fields 
generated by a Montecarlo extraction of nodes. A 
straightforward application of these results to the real 
sky, however, does not give good estimates of the prob- 
ability pk of a chance detection, because of the pres- 
ence of sources and of the Galactic emission, respon- 
sible of a not uniform 7-ray background over large 
spatial scales even at high energies. 

To have a more performing significance estimate, we 
adopted the strategy of localized Montecarlo extrac- 
tions. The wide field over which MST is computed 
(say, 120° x 40° in Galactic coordinates), was entirely 
divided in many small regions (4° x 4°) and in each 
of them we extracted a number of nodes equal to that 
of the observed events but having a uniform random 
position inside the small region. In this way we retain 
the initial density distribution over the original field. 
The MST was then 

computed for the entire field and clusters were selected 
with the same filters; then we derived the distributions 
of g and M, This procedure was repeated many time 
to improve the histograms of these distributions. An 
example, where the distributions in the real field are 
compared with those obtained from the local Monte- 
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Figure 2: Upper panels: Histograms of the distributions of g (left) and M (right) in the LAT field (black) and in the 
local Monte Carlo simulation (red). Lower panels: Exponential best fits of the g and M simulated distributions (thick 
black lines) in the high value intervals. 



carlo simulations, is given in Fig. 2. Note in the 
real field histograms the "long tails" at high values 
of g and M corresponding to the 7-ray sources. The 
significance of a source can be estimated by the dis- 
tributions of simulated fields. The portions of these 
curves for values higher than a percentile well above 
the mode (e.g. 60%) do not show a large dependence 
on the density differences and are well represented by 
exponential laws, as shown in the lower panels of Fig. 
2. Applying a cut at the 95% percentile to both g 
and M distributions (using the logic boolean OR) we 
made a strong selection of the clusters in Fig. 1 (right 
panel): practically all small elongated clusters, the 
majority of them having low g or M values, were re- 
jected. Only 12 over 55 clusters are selected as possi- 



ble sources and 8 of them coincide with sources in the 
LAT catalog. There are few LAT sources not found 
by MST, but they can have soft spectra and the pho- 
ton number above the 4 GeV threshold is too small to 
produce a significant cluster. Finally, there are four 
selected clusters not confirmed as detected sources in 
the LAT catalog. 
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